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SUMMARY 

In this paper, parallel processing is used to analyze the mixing and combustion behavior 
of hypersonic flow. Preliminary work for a sonic transverse hydrogen jet injected from a 
slot into a Mach 4 airstream in a two-dimensional duct combustor has been completed 
[Moon and Chung, 1996], Our aim is to extend this work to three-dimensional domain 
using multithreaded domain decomposition parallel processing based on the flowfieid- 
dependent variation theory [Schunk, Canabal, Heard, and Chung, 1999; Schunk and 
Chung, 1999], 

Numerical simulations of chemically reacting flows are difficult because of the strong 
interactions between the turbulent hydrodynamic and chemical processes. The algorithm 
must provide an accurate representation of the flowfield, since unphysical flowfield 
calculations will lead to the faulty loss or creation of species mass fraction, or even 
premature ignition, which in turn alters the flowfield information. Another difficulty 
arises from the disparity in time scales between the flowfield and chemical reactions, 
which may require the use of finite rate chemistry. The situations are more complex 
when there is a disparity in length scales involved in turbulence. In order to cope with 
these complicated physical phenomena, it is our plan to utilize the flowfield-dependent 
variation theory mentioned above, facilitated by large eddy simulation. Undoubtedly, the 
proposed computation requires the most sophisticated computational strategies. The 
multithreaded domain decomposition parallel processing will be necessary in order to 
reduce both computational time and storage. Without special treatments involved in 
computer engineering, our attempt to analyze the airbreathing combustion appears to be 
difficult, if not impossible. We describe in detail the parallel processing strategy below. 

Multi-threaded programming is utilized to take advantage of multiple computational 
elements on the host computer. Typically, a multi-threaded process will spawn multiple 
threads which are allocated by the operating system to the available computational 
elements (or processors) within the system. If more than one processor is available, the 
threads may execute in parallel resulting in a significant reduction in excution time. If 
more threads are spawned than available processors, the threads appear to execute 
concurrently as the operating system decides which threads execute while the others wait. 
One unique advantage of multi -threaded programming on shared memory multiprocessor 
systems is the ability to share global memory. This alleviates the need for data exchange 
or message passing between threads as all global memory allocated by the parent process 
is available to each thread. However, precautions must be taken to prevent deadlock or 
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race conditions resulting from multiple threads trying to simultaneously write to the same 
data. 

Threads are implemented by linking an application to a shared library and making calls to 
the routines within that library. Two popular implementations are widely used: the 
Pthreads library (and its derivatives) that are available on most Unix operating systems 
and the NTthreads library that is available under Windows NT. There are differences 
between the two implementations, but applications can be ported from one to the other 
with moderate ease and many of the basic functions are similar albeit with different 
names and syntax. 

Domain decomposition methods can be used in conjunction with multi-threaded 
programming to create an efficient parallel application. The sub-domains resulting from 
the decomposition provide a convenient division of labor for the processing elements 
within the host computer. In this application, an Additive Schwarz domain 
decomposition method is utilized. The method is illustrated below (Figure 1) for a two 
dimensional square mesh that is decomposed into four sub-domains. The nodes 
belonging to each of the four sub-domains are denoted with geometric symbols while 
boundary nodes are identified with bold crosses. The desire is to solve for each node 
implicitly within a single sub-domain. For nodes on the edge of each sub-domain this is 
accomplished by treating the adjacent node in the neighboring sub-domain as a boundary. 
The overlapping of neighboring nodes between sub-domains is illustrated in Figure 2. 
Higher degrees of overlapping, which may improve convergence at the expense of 
computation time, are also used. 

In a parallel application, load balancing between processors is critical to achieving 
optimum performance. Ideally, if a domain could be decomposed into regions requiring 
an identical amount of computation, it would be a simple matter to divide the problem 
between processing elements as shown in Figure 3 for four threads executing on an equal 
number of processors. 

Unfortunately, in a “real world” application the domain may not be decomposed such that 
the computation for each processor is balanced, resulting in lost efficiency. If the 
execution time required for each sub-domain is not identical, the CPU’s will become idle 
for portions of time as shown in Figure 4. 

One approach to load balancing, as implemented in this application, is to decompose the 
domain into more sub-domains than available processors and use threads to perform the 
computations within each block. The finer granularity permits a more even distribution 
of work amongst the available processing elements as shown in Figure 5. 

In this approach, the number of threads spawned is equal to the number of available 
processors with each thread marching through the available sub-domains (which 
preferably number at least two times the number of processors), solving one at a time in 
an “assembly-line” fashion. A stack is employed where each thread pops the next sub- 
domain to be solved off of the top of the stack. Mutual exclusion locks are employed to 
protect the stack pointer in the event two or more threads access the stack simultaneously. 
Each thread remains busy until the number of sub-domains is exhausted. If the number of 
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sub-domains is large enough, the degree of parallelism will be high although 
decomposing a problem into too many sub-domains may adversely affect convergence. 

The above approach will be utilized for the analysis of hypersonic airbreathing 
combustion with hydrogen fuel. Without using the parallel processing, the analysis with 
Mach 4 free-stream velocity for the finite rate chemistry with 18 species has been carried 
out as shown in Figs. 6 through 8 [Moon and Chung, 1996]. In this study, the standard 
K - 6 model was used. The proposed paper will utilize the large eddy simulation with 
high Mach numbers and high Reynolds numbers. It is our plan to report on the maximum 
ranges of Mach number and Reynolds number the proposed algorithm can accommodate. 
Our eventual goal in this paper is to determine the relationship between mixing and 
combustion efficiency. Length scales and time scales involved in turbulence and 
combustion will be thoroughly investigated. 
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Figure 1: Multiple Subdomains 
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Figure 2: Domain Decomposition 
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Flow-field Dependent Variation Approach 
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Development of the FDV Equations 
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Test Case for Global Hydrogen/Air Combustion Model 1 



Rogers, R.C. and Chinitz, W., “Using a Global Hydrogen-Air Combustion Model in Turbulent 
Reacting Flow Calculations”, AIAA Paper 82-0112, January 1982. 
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Density and Temperature Contours for Non-reacting Flow-field 











Parallel Programming: Processes and Threads 
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memory (within the same process) alleviating the need for process level 
communication. 


Multi-threaded Programming 
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Domain Decomposition 

Additive Schwarz Method with Overlapping Sub-domains 




Domain Decomposition 

Additive Schwarz Method with Overlapping Sub-domains 
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Each interior node is solved in an implicit fashion in exactly one sub-domain. 



Processor Load Balancing for the Ideal Case 




Processor Load Balancing in the “Real World 
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Multi-threaded Programming Implementation 




Decompose the domain Push each sub-domain onto a software stack 


Multi-threaded Programming Implementation 



Spawn threads and execute until stack is exhausted 


Computational Benchmarks 

















Conclusions and Future Plans 
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